UKNOF5: Richard Clayton – Content Filtering
Just popped in to the 5th UK Network Operators Forum to hear ORG advisory Council member Richard Clayton talk on content filtering. Here are my notes:
- Overview – content blocking system taxonomy – overblocking and other problems – avoiding the blocking altogether – attacking the blocking system –
- Cleanfeed and the ‘oracle attack’ – the IWF web site list – the political landscape Taxonomy Three ways of blocking content
- DNS poisoning; you arrange for your DNS server to provide the wrong results, so when you look up, say, lolita.com you are sent to the wrong site and will not find the content you’re looking for. Low cost, highly scalable. Can blog an indefinite no. of domains
- Blackhole routing; dropping the packets to the bad site. Also low cost, but limited, so will not scale.
- Proxy filtering; arrange that all web traffic goes through a web proxy. High cost, but very accurate and allows you to pick out exactly what you want to block.
Problems with DNS poisoning
People think it’s easy, but if you have sub-domains which you don’t wish to block, or if you want to allow email but not web traffic, then it’s not good enough. West German ISPs, where local government requires to block access to Nazi sites, and most ISPs managed make a mess of it, and managed to block some parts of the site but not the bits they were supposed to block, and all managed to mess up the email. Every ISP made at least one mistake. Blackhole routing Dropping packets will affect every web site hosted at the IP address. So you can’t block a single site at one IP address. So useless for sites like Geocities. Useless for huge numbers of other sites. You do not have one IP address per web site. Ben Edelman did a study on ‘overblocking’, and 87.3% of the sites shared an IP address with at least one other. Some web servers have over 50 sites on them. So ends up blocking innocent sites as well.
Proxy filtering
No overblocking, but it is expensive. Has costs in kit, and customer satisfaction, because proxies are slower and customers don’t like that, and can mess up ability to tell people apart. Not good news for users, but they are the best way of doing precise blocking. Avoidance for clients Some people don’t like being blocked and there are tricks for getting round it – use a different DNS server, very easy – use IP addresses instead of the domain name – use a relay, which often encrypts and anonymises; lots of these services out there, marketed to people who want to browse from their office desk but work just as well from home to get around blocks from ISP – people encode requests, (e.g. ‘request%73’ = requests) to avoid recognition; just look at spam for this. far more complex than it seems to just block domains – send malformed HTTP requests, e.g. multiple HOST protocol elements Avoidance for servers – move your site to another IP address, which is easy – change the port number, which is a bit trickier because we don’t have good systems for looking up port numbers – provide the same content on many different URLs, you can send out your spam and arrange that lolita.com is constant but then put a random string (which also allows you to check which of your spam emails works best) as some blockers don’t realise that what comes after the / is irrelevant and end up blocking the whole URL not the domain name. – accept unusually formatted requests
BT CleanFeed
CleanFeed is their internal name, but externally it’s not called that, but ‘anti-child-abuse initiative’. Two stage system from 2004, but similar designs used by other ISPs. – first stage is IP address based, so it checks to see if there might be child pornography and if it is then traffic is redirected to a proxy which then matches URLs, – this is what’s publicly known, not covered by NDA Users send their traffic to boundary to BT’s network. BT’s system decides which traffic is good, and sends it on its way. If it is going somewhere bad, it will go to their proxy and then decide if it’s going to a bad site, or somewhere innocent. If it’s supposed to be going somewhere bad, then it returns a 404, i.e. no accusations of wrongdoing. Fragile. – evading either stage evades the system, all previous attacks continue to be relevant – plus can attack the system in new ways, e.g. if include IP addresses for innocent sites, like Google or ITunes Music Store, in DNS results for bad sites then that will flood the second stage with legitimate traffic – if they give it local IP address then results in routing loops The oracle attack – can detect the first stage and so can tell which IP address is being blocked. If you sent lots of tcp/80 traffic you can see what comes back and tell whether your traffic is being redirected. Then you can find out which domain names are being hosted and these IP addresses. The Internet Watch Foundation (IWF) – set up in 1996 to deal with child porn on Usenet – operates consumer hot-line for reports – mainly concerned with web sites now – has a database of sites not yet removed – but sites move around very fast, and database needs to be regularly reviewed Politics – in Whitehall they thought it was impossible to censor or block the net until BT deployed CleanFeed, despite blocking systems in Norway, Saudi Arabia and Chine, for e.g. – ISPA claim 80% of consumers covered by systems that block illegal child images – Minster now wants all broadband to block by end 2007 – which is apparently voluntary but ‘if it appear that we are not going to meet our target through co-operation, we will review the situation’ Whitehall comprehension? – “recently it has become technically feasible for ISPs to block home users access to web sites irrespective of where in the world they are hosted” – they don’t understand the cost of the system, how fragile they are, how easy they are to evade, or how they can be attacked or made less secure or less stable. Also don’t understand that you can use the system to reverse engineer a list of sites to look at. After the events in August, Fratini (EU) wants the internet to be a ‘hostile environment’ for terrorists: “very important to explore further possibility of blocking web site that incite to commit terrorist action” – also blog drugs, gambling, holocaust denial. – don’t overlook civil cases: defamation, copyright material, MI6 agent list, industrial secrets, lists of company directors, etc. People will want web sites blocked. But people used to think ‘it’s not possible’ but now they are saying it is, and the more people think it’s possible the more they want it.
More on this in Richard’s PhD thesis, Chapter 7, which is available on his site. Biggest problem country is actually the USA – they are not good at removing pedophile material from the internet. How big is the IWF database? 888 items? Can infer what the IWF publish, because they have said 38% of sites are still active after 2 months, so they are checking it. Problem with doing research into the blocking of child porn because, of course, looking at the sites is illegal, so you can’t check the content. Only a small percentage of sites reported to the IWF check actually have child porn. IWF and BT refused to allow Richard to have his site added to their blacklist so that he could check to see how well the system works.